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Abstract: We give a self-contained introduction to the theory of directed graphs, leading 
up to the relationship between the Perron-Frobenius eigenvectors of a graph and its autocat- 
alytic sets. Then we discuss a particular dynamical system on a fixed but arbitrary graph, that 
describes the population dynamics of species whose interactions are determined by the graph. 
The attractors of this dynamical system are described as a function of graph topology. Finally 
we consider a dynamical system in which the graph of interactions of the species coevolves 
with the populations of the species. We show that this system exhibits complex dynamics 
including self-organization of the network by autocatalytic sets, growth of complexity and 
structure, and collapse of the network followed by recoveries. We argue that a graph theoretic 
classification of perturbations of the network is helpful in predicting the future impact of a 
perturbation over short and medium time scales. 



0.1 Introduction 

Studies of networks are useful at several different levels (for recent reviews see ^, ^|, Q]). 
At one level one is interested in describing the structure of natural and man-made networks 
such as food webs in ecosystems, biochemical and neural networks in organisms, networks of 
social interaction among agents in societies, and technological networks like the internet, etc. 
A useful representation of a network is a graph (and its generalizations) where the components 
of the network (which could be species, neurons, agents, etc.) are represented by nodes, and 
their mutual interactions by the links of the graph. Graph theory provides important tools to 
capture various aspects of the network structure. 

At a second level one wants to know how the network structure of the system influences 
what happens in the system. E.g., the food-web structure of an ecosystem affects the dynamics 
of populations of the species, the network of human contacts influences the spread of a con- 
tagious disease, etc. At this level of discussion the network is typically taken to be static on 
the time scales of interest; the prime concern is the dynamics of other variables on a network 
with some particular type of (fixed) structure. Here dynamical systems theory is a major tool, 
and network variables (like the adjacency matrix elements of the underlying graph) appear as 
fixed parameters in the dynamics of other system variables like population, etc. 

At a third level one is interested in how networks themselves change with time. Biochem- 
ical, neural, ecological, social and technological networks are not static, but are products of 
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evolution. Moreover this evolution is quite complex in real systems. Networks sometimes 
self-organize and grow in size and complexity, and sometimes disintegrate. Their evolution 
is usually intertwined with other system variables, e.g., a food-web influences populations of 
species, and if a species goes extinct, the food- web changes. Understanding the processes and 
mechanisms involved in the evolution of complex networks is a major intellectual challenge. 

A problem that illustrates all these levels is the problem of the origin of life on earth. 
The simplest living structure that we know — a bacterial cell — is a complex collection 
of several thousand types of molecules interacting with each other in a complex network of 
chemical interactions. The network may be described by a graph in which the nodes repre- 
sent the molecular types or molecular species, and links connecting nodes represent chemical 
interactions between the molecular species. By participating in specific chemical reactions 
each molecular species or node plays a rather definite functional role in the organization of 
the cell: it permits or creates certain specific processes or spatial structures. Note that the 
complex chemical network of a cell is needed to produce the processes and structures that 
exist in it, and conversely, the same processes and structures are essential for maintaining the 
network and allowing it to evolve. If we assume that life originated on earth about 3.5 to 3.8 
billion years ago as suggested by the microfossil evidence, then about 4 billion years back 
there was neither such a complex network of interactions nor such processes and structures 
existing anywhere on the earth. One of the puzzles of the origin of life on earth is: how did 
the network and the processes and spatial structures bootstrap themselves into existence when 
none was present — how did a chemical 'organization' emerge with individual molecular 
species playing definite roles in it? 

A second puzzle concerns the highly 'structured' nature of the organization. The molecules 
appearing in cells are very special (a small subset in a very large space of possible molecules) 
and so is the graph that describes their interactions (a special kind of graph in the very large 
space of graphs). The probability of such structures arising by pure chance is astronomically 
small. If we assume that it was not an unlikely chance event that created life, we are led 
to the question: what then are the mechanisms that can create highly structured or 'ordered' 
organizations? A similar question is relevant for economic and social networks. 

In order to address such questions in a mathematical model, one is naturally led to dynam- 
ical systems in which the graph describing the network is also a dynamical variable, whose 
dynamics is coupled to that of other variables such as the population of the molecular species. 
Here wepresent a model with such a structure, which has been inspired by the work in refs. 
|g, ^ ||, [Kj] ■ The analysis of such dynamical systems is facilitated by the development of 
some new tools in graph theory. Another purpose of this article is to discuss some of these new 
tools. Together, the model and these tools address the above two questions about the origin 
of life, and provide partial answers. The model exhibits a mechanism by which a chemical 
organization can emerge where none existed through the formation of small autocatalytic sets 
of molecular species. In the model we also observe a self-organizing process which results 
in the growth of the initial autocatalytic set into a complex and highly structured chemical 
organization in a short time. 

In addition, the model also captures, in an analytically tractable form, several phenomena 
that one associates with the evolution of other biological and social systems. These include 
emergence of cooperation and interdependence in the system; crashes and recoveries of the 
system as a whole; 'core-shifts'; appearance of 'keystone species'; etc. We also argue that 
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the juxtaposition of graph theory and dynamical systems provides the possibility of formulat- 
ing more precisely notions that are important and useful in everyday language but otherwise 
difficult to pin down. In particular we attempt to formulate the notion of 'innovation' in this 
dynamical system, and classify innovations into categories according to their graph theoretic 
structure. It turns out that different categories of innovation have different short and longer 
term impact on the dynamics of the system. 

This article is organized roughly according to the three kinds of network studies indicated 
above. In section 2 we discuss aspects of graph theory in a self-contained manner, reviewing 
older results as well as recent work. Among other things we describe a relation between topo- 
logical properties of a graph (namely its autocatalytic sets) and its algebraic properties (the 
structure of the eigenvectors of its adjacency matrix). In section 3 we discuss a simple dynam- 
ical system describing molecular population dynamics on a fixed interaction graph. Here we 
show how structure of the graph influences the dynamics of the system; in particular relating 
the nature of its attractors to graph topology. Section 4 describes a model of graph evolution, 
motivated by the origin of life problem. In section 5 we show that the dynamics of this model 
exhibits self-organization and growth of cooperation and structure in the network, with ana- 
lytical estimates of the time scales involved. Section 6 discusses the phenomena of crashes 
and recoveries exhibited by the model. In this section we also formulate a definition of inno- 
vation that seems appropriate for this model, and discuss a hierarchy of different categories 
of innovation and the roles they play in the ups and downs of the system. Finally, section 7 
contains a discussion of some limitations of the model, speculations regarding the origin of 
life problem and possible future directions. 



0.2 Graph theory and autocatalytic sets 
0.2.1 Directed graphs and their adjacency matrices 

A directed graph G — G(S, L), often referred to in the sequel as simply a graph, is defined 
by a set S of 'nodes' and a set L of 'links' (or 'arcs'), where each link is an ordered pair of 
nodes JlH It is convenient to label the set of nodes by integers, S = {1,2, . . . , s} for a 
graph of s nodes. An example of a graph is given in Figure |l]a where each node is represented 
by a small labeled circle, and a link (j, i) is represented by an arrow pointing from node j to 
node i, A graph with s nodes is completely specified by an s x s matrix, C = (cy ;), called 
the adjacency matrix of the graph, and vice versa. The matrix element in the i th row and 
j th column of C, Cij, equals unity if L contains a directed link (arrow pointing from 
node j to node i), and zero otherwise. (This convention differs from the usual one where 
Cij = 1 if and only if there is a link from node i to node j; our adjacency matrix is the 
transpose of the usual one. We have chosen this convention because it is more natural in the 
context of the dynamical system to be discussed in subsequent sections.) Figure |l]b shows the 
adjacency matrix corresponding to the graph in Figure [j]a. We will use the terms 'graph' and 
'adjacency matrix' interchangeably: the phrase 'a graph with adjacency matrix C will often 
be abbreviated to 'a graph C". Undirected graphs are special cases of directed graphs whose 
adjacency matrices are symmetric. A single (undirected) link of an undirected graph between, 
say, nodes j and i, can be viewed as two directed links of a directed graph, one from j to i and 
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the other from i to j. 

A graph G = G'(S', L') is called a subgraph of G{S, L) if S" C S and V C L. We will 
use the term 'subgraph' if G' satisfies a stronger property: every link in L with both endpoints 
in S' also belongs to L' . That is, for us, a subgraph will be a subset of nodes together with 
all their mutual links. (This is often called an 'induced subgraph' in the literature [|l2[|.) The 
graph in Figure |p (comprising nodes 14, 15, 16, 17, 18 and 19 and all their mutual links) is 
thus a subgraph of the graph in Figure |l]a. For a subgraph we will often find it more convenient 
to label the nodes not by integers starting from 1, but by the same labels the corresponding 
nodes had in the parent graph. The adjacency matrix of a subgraph can be obtained by deleting 
all the rows and columns from the full adjacency matrix that correspond to the nodes outside 
the subgraph. The highlighted portion of the matrix in Figure [j]b is the adjacency matrix of 
the subgraph in Figure [j]c. 

A walk of length n (from node ix to node i n +i) is an alternating sequence of nodes and 
links ixh^h ■ ■ ■ in^nin+i such that link 1% points from node i% to node %2 (or 1% — (ix, £2)), 
I2 points from 12 to 13 and so on. A walk with all nodes distinct (except possibly the first and 
last nodes) will be called a path. If the first and last nodes ix and i n +x of a walk or path are 
the same, it will be referred to as a closed walk or path. The existence of even one closed 
walk in the graph implies the existence of an infinite number of distinct walks in the graph. 
In the graph of Figure [j]a, there is an infinite number of walks from node 1 1 to node 17 (e.g., 
11 ->• 12 -> 14 -> 17, 11 -> 12 -> 11 ->• 12 ->• 14 -> 17, . . .) but no walks from node 11 to 
node 10. An undirected graph trivially has closed walks if it has any undirected links at all. 

In the graph theory literature, what we have defined above to be a 'closed path' is usually 
referred to as a 'cycle'. However, for later convenience, we define a cycle somewhat differ- 
ently. We define an n-cycle to be a subgraph with n > 1 nodes which contains exactly n links 
and also contains a closed path that covers all n nodes. E.g., the subgraph formed by node 20 
and its self link is a 1 -cycle, that formed by nodes 1 and 2 is a 2-cycle and by nodes 3,4 and 
5 a 3-cycle. The subgraph formed by nodes 1,2,3,4 and 5 is not a 5-cycle because it does not 
have a closed path covering all the five nodes. The word 'cycle' will be used generically for 
an n-cycle of unspecified length. 

Given a directed graph C, its associated undirected graph (or 'symmetrized version 
can be obtained by adding additional links as follows: for every link (j, i) in L, add another 
link (i, j) if the latter is not already in L. Two nodes of a directed graph C will be said to 
be connected if there exists a path between them in the associated undirected graph C^ s \ 
and disconnected otherwise. Thus any directed graph can be decomposed into 'connected 
components' which are maximal sets of connected nodes (e.g., the graph of Figure [j]a has five 
connected components that are disconnected from each other). In a directed graph C, we refer 
to a node i as being 'downstream' from a node j if there is a path in C leading from j to i, 
and no path from i to j. Similarly i is 'upstream' from j if there is a path in C leading from 
i to j, and no path from j to i. Thus in Figure [j]a, node 17 is downstream from node 1 1, or 
equivalently node 1 1 is upstream from node 17. Node 10 is neither upstream nor downstream 
from node 1 1 since they are not connected, and node 12 is neither upstream nor downstream 
from 1 1 because each can be reached from the other along some directed path. 

If C is the adjacency matrix of a graph then it is easy to see that (C n )ij equals the number 
of distinct walks of length n from node j to node i. E.g., Cfj = X}fe=i ^ikCkj', each term in 
the sum is unity if and only if there exists a link from j to k and from k to i; hence the sum 
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counts the number of walks from j to i of length 2. 
Perron-Frobenius eigenvalues and eigenvectors (PFEs) 

A vector x = (x% , X2 , ■ ■ ■ , x s ) is said to be an eigenvector of an s x s matrix C with an 
eigenvalue A if for each i, c ij x j = ^ x i- The eigenvalues of a matrix C are roots of 

the characteristic equation of the matrix: \C — \I\ =0 where / is the identity matrix of the 
same dimensionality as C and \A\ is the determinant of the matrix A. In general a matrix will 
have complex eigenvalues and eigenvectors, but an adjacency matrix of a graph has special 
properties, because it is a 'non-negative' matrix, i.e., it has no negative entries. 

For any non-negative matrix, the Perron-Frobenius theorem Jl3| , [l4| ] guarantees that there 
exists an eigenvalue which is real and larger than or equal to all other eigenvalues in magni- 
tude. This largest eigenvalue is often called the Perron-Frobenius eigenvalue of the matrix, 
which we will denote by Ai (C) for a graph C. Further the theorem also states that there exists 
an eigenvector of C corresponding to Ai(C) (which we will refer to as a Perron-Frobenius 
Eigenvector, PFE) all of whose components are real and non-negative. The Perron-Frobenius 
eigenvalue of the graph in Figure [j]a is 1 . Four PFEs of the graph in Figure |l]a are displayed 
in Figure pi. 

The presence or absence of closed paths in a graph can be determined from the Perron- 
Frobenius eigenvalue of its adjacency matrix (see ref. fll6| ] for a simple proof): 

Proposition 1 . If a graph, C, 

(i) has no closed walk then Ai (C) = 0, 

(ii) has a closed walk then Ai (C) > 1, 

(iii) has a closed walk and all closed walks only occur in subgraphs that are cycles then 
Ai(C) = l. 

Note that A x cannot take values between zero and one because of the discreteness of the entries 
of C which are either zero or one. (Thus, for an undirected graph, if it has even one undirected 
link, Ai(C) > 1.) Several results pertaining to the relationship of the graph structure to the 
structure of its PFEs can be found in ref. [p"5|]. 

Irreducible graphs and matrices 

A subgraph of a directed graph is termed irreducible if there is a path within the subgraph 
from each node in the subgraph to every other node in the subgraph. The simplest irreducible 
subgraph is a 1 -cycle. In Figure [l]a the subgraph comprising nodes 3,4 and 5 is irreducible, as 
is the subgraph of nodes 6 and 7, but the subgraph of nodes 3,4,5,6 and 7 is not irreducible 
since there is, for example, no path from node 6 to node 5. 

If a graph or subgraph is irreducible then the corresponding adjacency matrix is also 
termed irreducible. Thus a matrix C is irreducible if for every ordered pair of nodes i and 
j there exists a positive integer k such that (C k )ij > 0. Refs. [ pj[ |l4| ] describes further prop- 
erties of irreducible matrices. 

The nodes of any graph can be grouped into a unique set of irreducible subgraphs as 
follows: 

(1) Pick any node, say i. Find all the nodes which have paths leading to them starting at 
i. Denote this set by Si; it may include i itself. Similarly find all the nodes which have 
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paths leading to i. Denote this set by S2. Denote the subgraph formed by the set of nodes 
{i} U (Si n S2) and all their mutual links as C\. If Si (1 S2 ^ $ ($ denotes the empty set), 
then Ci is an irreducible graph because every node of Ci has a path within Ci to every other 
node in it. If Si fl S2 = then i does not belong to any irreducible subgraph and Ci consists 
of just the node i and no links. 

(2) Pick another node which is not in Ci and repeat the procedure with that node to get another 
subgraph, C2. The sets of nodes comprising the two subgraphs will be disjoint. 

(3) Repeat this process until all nodes have been placed in some C a , a — 1,2, ... , M. Each 
C a is either an irreducible subgraph or consists of a single node with no links. 

Irrespective of which nodes are picked and in which order, this procedure will produce 
for any graph a unique set of disjoint subgraphs (upto labelling of the C a ) encompassing all 
the nodes of the graph. The graph in Figure |l]a will decompose into 14 such subgraphs (see 
Figure |j]e). 

We say there is a path from an irreducible subgraph Ci to another irreducible subgraph C2 
if there is a path in C from any node of Ci to any node of C2. The terms 'downstream' and 
'upstream' can thus be used unambiguously for the C a . 



Decomposition of a general graph 

A general adjacency matrix can be rewritten in a useful form by renumbering the nodes by the 
following procedure Jl3|, |l4j]: 

Determine all the subgraphs Ci, C2, ■ ■ ■ , Cm of the graph as described above. Construct a 
new graph of M nodes, one node for each C a , a — 1, . . . , M. The new graph has a directed 
link from Cp to C a if, in the original graph, any node of Cp has a link to any node of C a . 
Figure |l|e illustrates what this new graph looks like for the graph of Figure |l]a. 

Clearly the resulting graph cannot have any closed paths. For if it were to have a closed 
path then the C a subgraphs comprising the closed path would together have formed a larger 
irreducible subgraph in the first place. Therefore we can renumber the C a such that if a > (3, 
Cp is never downstream from C a . Now we can renumber the nodes of the original graph such 
that nodes belonging to a given C a occupy contiguous node numbers, and whenever a pair of 
nodes i and j belong to different subgraphs C a and Cp respectively, then a > (3 implies i > j. 
Such a renumbering is in general not unique, but with any such renumbering the adjacency 
matrix takes the following canonical form: 



/ Ci 



c 



\ 



C-2 



V R 



Cm / 



where indicates that the upper block triangular part of the matrix contains only zeroes while 
the lower block triangular part, R, is not equal to zero in general. It can be seen that the graph 
in Figure |l]a is already in this canonical form. In Figure [l|b, the dotted lines demarcate the 
block diagonal portions which correspond to the C a . 
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From the above form of C it follows that 

\C - XI\ = Id - AI| x |C 2 - \I\ x . . . x \C M - A/| 

Therefore the set of eigenvalues of C is the union of the sets of eigenvalues of C\, . . . , Cm- 
Ai(C) = max Q {Ai(C Q )}. 

Therefore if a given graph C has a Perron-Frobenius eigenvalue Ai > then it contains 
at least one irreducible subgraph with Perron-Frobenius eigenvalue Ai. When Ai > 0, all 
irreducible subgraphs of C with Perron-Frobenius eigenvalue equal to Ai are referred to as 
basic subgraphs. The yellow nodes in Figure [l^ correspond to the basic subgraphs of Figure 
0a. 



0.2.2 Autocatalytic sets 

The concept of an autocatalytic set (ACS) was first introduced in the context of a set of cat- 
alytically interacting molecules. There it was defined to be a set of molecular species which 



contains a catalyst for each of its member species \ \17[ [18} |19p. Such a set of molecular 
species can collectively self-replicate under certain circumstances even if none of its compo- 
nent molecular species can individually self-replicate. This property is considered important 
in understanding the origin of life. If we imagine a node in a directed graph to represent a 
molecular species and a link from j to i as signifying that j is a catalyst for i, this motivates 
the following graph-theoretic definition of an ACS in any directed graph: An autocatalytic 
set (ACS) is a subgraph, each of whose nodes has at least one incoming link from a node 
belonging to the same subgraph. 

Figure ^ shows various ACSs. The simplest ACS is a 1 -cycle; Figure ^a. There is the fol- 
lowing hierarchical relationship between cycles, irreducible subgraphs and ACSs: all cycles 
are irreducible subgraphs and all irreducible subgraphs are ACSs, but not all ACSs are irre- 
ducible subgraphs and not all irreducible subgraphs are cycles. Figures ^a and|2|b are graphs 
that are irreducible as well as cycles, ^c is an ACS that is not an irreducible subgraph and 
hence not a cycle, while ||d and p|e ar e examples of irreducible graphs that are not cycles. It is 
not difficult to see the following jl6\ : 



Proposition 2. 

(i) An ACS must contain a closed path. Consequently, 

(ii) If a graph C has no ACS then Ai (C) = 0. 

(iii) If a graph C has an ACS then \\{C) > 1. 



Relationship between autocatalytic sets and Perron-Frobenius eigenvectors 

The ACS is a useful graph-theoretic construct in part because of its connection with the PFE. 
Let x be a PFE of a graph. Consider the set of all nodes i for which Xi is non-zero. We will 
call the subgraph of all these nodes and their mutual links the 'subgraph of the PFE x'. If all 
the components of the PFE are non-zero then the subgraph of the PFE is the entire graph. For 
example the subgraph of the PFE e3 mentioned in Figure |l]d is is the graph shown in Figure 
One can show that [ |l6| ] 
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Proposition 3 

If Ai (C) > 0, then the subgraph of any PFE of C is an ACS. 

For the PFEs of Figure [TJd this is immediately verified by inspection. Note that this result 
relates an algebraic property of a graph, its PFE, to a topological structure, an ACS. Further, 
this result is not true if we considered irreducible graphs intead of ACSs. E.g., the subgraph 
of e3, shown in Figure [j]c, is not an irreducible graph. 

Note also that the converse of the above statement is not true, i.e., there need not exist a 
PFE for every ACS in a given graph. Thus in Figure [j]a, nodes 3,4,5,6 and 7 form an ACS but 
there is no eigenvector with eigenvalue Ai for which all these and only these components are 
non-zero. 

Let x be a PFE of a graph C, and let C denote the adjacency matrix of the subgraph of 
x. Let Ai(C') denote the Perron-Frobenius eigenvalue of C. It is not difficult to see that 
Ai(C') = Ai(C). Figure [^illustrates this point. For the graph in Figure gaAi = 1. Figure |b 
shows a PFE of the graph and how it satisfies the eigenvalue equations. For this PFE, nodes 
1, 5 and 6 have Xi = 0. Removing these nodes produces the PFE subgraph shown in Figure 
|J;. Its adjacency matrix, C", is obtained by removing rows 1, 5, 6 and columns 1, 5, 6 from 
the original matrix. Figure ||d illustrates that the vector constructed by removing the zero 
components of the PFE is an eigenvector of C with eigenvalue 1 . The logic of this example 
is easily extended to a general proof that Ai (C) = Ai (C). 

We can now perform a graph decomposition of C into irreducible subgraphs as before; 
since Ai(C') = Ai(C), it follows that C must contain at least one of the basic subgraphs of 
C. If C contains only one of the basic subgraphs of C we will refer to x as a simple PFE, and 
to C as a simple ACS. The graph in Figure |l]a has only four simple PFEs which are displayed 
in Figure |l]d. All PFEs of C are linear combinations of its simple PFEs. 

Core and periphery of a simple PFE 

If C is the subgraph of a simple PFE, the basic subgraph of C contained in C will be called 
the core of C (or equivalently, the 'core of the simple PFE'), and denoted Q' . The set of the 
remaining nodes and links of C that are not in its core will together be said to constitute the 
periphery of C . For example, for the PFE in Figure |l]c: the core is the 2-cycle comprising 
nodes 14 and 15. Note that the periphery is not a subgraph in the sense we are using the 
word 'subgraph', since it contains links not just between periphery nodes but also from nodes 
outside the periphery (like the link from node 15 to 16 in Figure |l]c). 

The core and periphery can be shown to have the following topological property (which 
justifies the nomenclature): 

Proposition 4. From every node in the core of ( the subgraph of) a simple PFE there exists a 
path leading to every other node of the PFE subgraph. From no periphery node is there any 
path leading to any core node. 

Thus all periphery nodes are downstream from all core nodes. Starting from the core one can 
reach the periphery but not vice versa. 



It follows from the Perron-Frobenius theorem for irreducible graphs [ 13 1 that Ai (Q') will 
necessarily increase if any link is added to the core. Similarly removing any link will decrease 
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Ai(Q'). Thus Ai measures the multiplicity of internal pathways in the core. Figure ^ illus- 
trates this point. 

Core and periphery of a non-simple PFE 

Since any PFE of a graph can be written as a linear combination of a set of simple PFEs (this 
set is unique for any graph), the definitions of core and periphery can be readily extended to 
any PFE as follows: 

The core of a PFE, denoted Q, is the union of the cores of those simple PFEs whose linear 
combination forms the given PFE. The rest of the nodes and links of the PFE subgraph consti- 
tute its periphery. It follows from the above discussion that Ai(Q) = Ai(C). When the core 
is a union of disjoint cycles then Ai (Q) = 1, and vice versa. 

The structure of PFEs when there is no ACS 

The above discussion about the structure of PFEs was for graphs C with Ai(C) > 0. If 
Ai(C) = 0, the graph has no ACS. Then the structure of PFEs is as follows: there exists 
a PFE for every connected component of the graph. Since there are no closed walks in the 
graph, all walks have finite lengths. Consider the longest paths in a given connected compo- 
nent. Identify the nodes that are the endpoints of these longest paths. The PFE corresponding 
to the given connected component will have x.- L > for each of the latter nodes and Xi = 
for all other nodes in the graph. Again a general PFE is a linear combination of all such PFEs, 
one for each connected component of the graph. In this case since there is no closed path there 
is no core (or periphery) for any PFE of the graph. The core of all PFEs of such a graph may 
be defined to be the null set, Q = $. 



0.3 A dynamical system on a fixed graph 

In the previous section we have discussed the properties of graphs and their associated adja- 
cency matrices, eigenvalues and eigenvectors. In this section we discuss the dynamical signif- 
icance of the same constructs. In particular, we present an example of a dynamical system on 
a fixed graph described by a set of coupled ordinary differential equations, whose attractors 
are precisely the PFEs discussed above. This dynamical system arises as an idealization of 
population dynamics of a set of chemicals. 

Consider the simplex of normalized non-negative vectors in s dimensions: J = {x = 
(xi,X2, ■ ■ ■ , x s ) € R s |0 < Xi < 1, J2t=i x i — !}■ For a fixed graph C — (cy) with s nodes, 
consider the set of coupled ordinary differential equations p0| ] 

s s 

Xi ^ Cij Xj Xi ^ Ckj Xj . ( 1 ) 

i=l j,k=l 

This will be the dynamical system of interest to us in this section. 

Note that the dynamics preserves the normalization of x, J2i=i &i = 0- F° r non-negative 
C it leaves the simplex J invariant. (For negative cy, additional conditions have to be added 
(see [pip ) but we do not discuss that case here.) 
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The links of the graph represent the interactions between the variables a;, that live on 
the nodes. Xi could represent, for example, the relative population of the i th species in a 
population of s species, or the probability of the i th strategy among a group of s strategies 
in an evolutionary game, or the market share of the i th company among a set of competing 
companies, etc. It is useful to see how equation (|l]) arises in a population dynamic context. 

Let is {1, . . . , s} denote a chemical (or molecular) species in a chemical reactor. Molecules 
can react with each other in various ways; we focus on only one aspect of their interactions: 
catalysis. The catalytic interactions can be described by a directed graph with s nodes. The 
nodes represent the s species and the existence of a link from node j to node i means that 
species j is a catalyst for the production of species i. In terms of the adjacency matrix, 
C = {cij} of this graph, is set to unity if j is a catalyst of i and is set to zero other- 
wise. The operational meaning of catalysis is as follows: 

Each species i will have an associated non-negative population j/j in the pond which 
changes with time. In a certain approximation (discussed below) the population dynamics 
for a fixed set of chemical species whose interactions are given by C, will be given by 

s 

Vi = ^2cijyj - <f>yi, (2) 

3=1 

where <j>(t) is some function of time. To see how such an equation might arise, assume that 

species j catalyses the ligation of reactants A and B to form the species i, A + B ^ i. Then 
the rate of growth of the population yi of species j in a well stirred reactor will be given by 
y\ = k(l + vyj)nAriB — <fiyi, where jia, ub are reactant concentrations, k is the rate constant 
for the spontaneous reaction, v is the catalytic efficiency, and represents a common death 
rate or dilution flux in the reactor. Assuming the catalysed reaction is much faster than the 
spontaneous reaction, and that the concentrations of the reactants are large and fixed, the rate 
equation becomes yi = Kyj — <pyi, where K is a constant. In general since species i can have 
multiple catalysts, we get yi = Ylj=i KijVj ~ 'h/U with Jfy ~ Cy. We make the further 
idealization Kij = cy giving equation (Q). 

The relative population of species i is by definition = yij J2j=i Vr Therefore x = 
(x%, . . . , x s ) S J, since < xi < 1, Y^=i x i = !■ Taking the time derivative of and using 
(Q) it is easy to see that ±i is given by ([l]). Note that the 4> term, present in (Q), cancels out and 
is absent in (jl|). 

We remark that the quasispecies equation jl7| ] has the same form as equation (Q), albeit 
with a different interpretation and a special structure of the C matrix that arises from that in- 
terpretation. 



0.3.1 Attractors of equation (|l|) 

The rest of this section consists of examples and arguments to justify the 
Proposition 5. For any graph C, 

(i) Every eigenvector of C that belongs to J is a fixed point o/Y|/]), and vice versa. 

(ii) Starting from any initial condition in the simplex J, the trajectory converges to some fixed 
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point (genetically denoted X) in J. 

(iii) For generic initial conditions in J, X is a Perron-Frobenius eigenvector (PFE) of C. 
(For special initial conditions, forming a space of measure zero in J, X could be some other 
eigenvector of C. Henceforth we ignore such special initial conditions.) 

(iv) If C has a unique PFE, X is the unique stable attractor of ([/]). 

(v) // C has more than one linearly independent PFE, then X can depend upon the initial 
conditions. The set of allowed X is a convex linear combination of a subset of the PFEs. The 
interior of this convex set in J may then be said to be the 'attractor' of ([[]), in the sense that 
for generic initial conditions all trajectories converge to a point in this set. 

(vi) For every X belonging to the attractor set, the set of nodes ifor which Xi > is the same 
and is uniquely determined by C. The subgraph formed by this set of nodes will be called the 
'subgraph of the attractor' of (|/|) for the graph C. Physically, this set consists of nodes that 
always end up with a nonzero relative population when the dynamics ([!]) is allowed to run its 
course, starting from generic initial conditions. 

(vii) If Ai (C) > 0, the subgraph of the attractor of ([/]) is an ACS. This ACS will be called the 
dominant ACS of the graph. The dominant ACS is independent of (generic) initial conditions 
and depends only on C. 

For example for the graph of Figure [j]a, X is a convex linear combination of &2 and ea, 
X = ae2 + (1 — a)e3, with < a < 1. a depends upon initial conditions; generically 
< a < 1. The subgraph of the attractor contains eight nodes, 6,7,14-19. Starting with 
generic initial conditions where all the Xi are nonzero, the trajectory will converge to a point 
X where these eight nodes have nonzero X.- L and each of the other twelve nodes have X.- L = 0. 
The eight populated nodes form an ACS, the dominant ACS of the graph. 

To see (i), let x A G J be an eigenvector of C, J^j c ij x j = ^ x i- Substituting this on the 
r.h.s. of ([!]), one gets zero. Conversely, if the r.h.s. of ([!]) is zero, one finds x = x A , with 
A = Ylk,j Ck i x j- 

To motivate (ii) and (iii) it is most convenient to consider the underlying dynamics (^) 
from which (jl|) is derived: Since (|lj) is independent of cj>, we can set <p = in (g) without any 
loss of generality. With = the general solution of (||), which is a linear system, can be 
schematically written as: 

y(i) = e ct y(0), 

where y(0) and y(t) are viewed as column vectors. Suppose y(0) is a right eigenvector of C 
with eigenvalue A, denoted y A . Then 

y(t)=e xt y x . 

Since this time dependence is merely a rescaling of the eigenvector, this is an alternative way 
of seeing that x A = y A / 53i=i vf ^ s a fi xe d point of ([[]). If the eigenvectors of C form a basis 
in R s , y(0) is a linear combination: y(0) = J2\ a Ay A - In that case, for large t it is clear that 
the term with the largest value of A will win out, hence 

y(t) 4 e^V 1 

where Ai is the eigenvalue of C with the largest real part (which we know is the same as its 
Perron-Frobenius eigenvalue) and y Al an associated eigenvector. Therefore, for generic initial 
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conditions the trajectory of ([I]) will converge to X = x Al , a PFE of C. If the eigenvectors of 
C do not form a basis in R s , the above result is still true (as we will see in examples). 

Note that Ai can be interpreted as the 'population growth rate' at large t, since y(t) 
Aiy. In the previous section we had mentioned that Ai measures a topological property of 
the graph, namely, the multiplicity of internal pathways in the core of the graph. Thus in the 
present model, Ai has both a topological and dynamical significance, which relates two dis- 
tinct properties of the system, one structural (multiplicity of pathways in the core of the graph), 
and the other dynamical (population growth rate). The higher the multiplicity of pathways in 
the core, the greater is the population growth rate of the dominant ACS. 

Part (iv) follows from the above. We will give examples as illustrations of (v) and (vi). 
Further, from Proposition 3, previous section, we know that the subgraph of a PFE has to be 
an ACS, whenever Ai > 0. That explains (vii). It is instructive to consider examples of graphs 
and see how the trajectory converges to a PFE. 

Example 1. A simple chain, Figure [|a: 

The adjacency matrix of this graph has all eigenvalues (including Ai) zero. There is only 
one (normalized) eigenvector corresponding to this eigenvalue, namely e = (0,0,1) and 
this is the unique PFE of the graph. (This is an example where the eigenvectors of C do 
not form a basis in R s .) Since node 1 has no catalyst, its rate equation is (henceforth tak- 
ing <f> = 0) ?/i = 0. Therefore yi(t) = yi(0), a constant. The rate equation for node 
2 is 2/2 = Vi = yi (0). Thus y 2 (t) = y 2 {0) + j/i(0)t. Similarly y 3 = y 2 implies that 
V3(t) = (l/2)yi(0)i 2 + y 2 (0)t + y 3 (0). At large t, y x = constant, y 2 ~ t, y 3 ~ t 2 ; hence 
j/3 dominates. Therefore, Xi — limt_»oo Xi(t) is given by X\ — 0, X 2 — 0, X 3 — 1. Thus we 
find that X equals the unique PFE e, independent of initial conditions. 

Example 2. A 1 -cycle, Figure ||b: 

This graph has two eigenvalues, Ai = 1, Aa = 0. The unique PFE is e = (1,0). The rate 
equations are y\ = 2/1,2/2 = 0, with the solutions yi(t) = 2/i(0)e*, y 2 (t) — 2/2(0). At large 
t node 1 dominates, hence X = (1,0) = e. The exponentially growing population of 1 is a 
consequence of the fact that 1 is a self -replicator, as embodied in the equation y\ = y\. 

Example 3. A 2-cycle, Figure 

The corresponding adjacency matrix has eigenvalues Ai = 1, A2 = —1- The unique normal- 
ized PFE is e = (1/2, 1/2). The population dynamics equations are y\ = y 2 , y 2 = 2/1. The 
general solution to these is (note y\ =2/1) 

2/i (t) - Ae l + Be-\ y 2 (t) = Ae l - Be'* . 

Therefore at large t, 2/1 — > Ae*, y 2 — > Ae*, hence X = (1, l)/2 = e. Neither 1 nor 2 is in- 
dividually a self -replicating species, but collectively they function as a self-replicating entity. 
This is true of all ACSs. 



Example 4. A 2-cycle with a periphery, Figure |p: 

This graph has Ai = 1 and a unique normalized PFE e = (1, 1, l)/3. The population equa- 
tions for 2/1 and y 2 and consequently their general solutions are the same as Example 3, but 
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now in addition j/3 = 1/2, yielding ys(t) — Ae t + Be~ f + constant. Again for large t, 
2/1,2/2, 2/3 grow as ~ Ae f , hence X = (1, 1, l)/3 = e. The dominant ACS includes all the 
three nodes. 

This example shows how a parasitic periphery (which does not feed back into the core) is 
supported by an autocatalytic core. This is also an example of the following general result: 
when a subgraph C, with largest eigenvalue A^, is downstream from another subgraph C" 
with largest eigenvalue A" > A' l5 then the population of the former also increases at the rate 
A". Therefore if C" is populated in the attractor, so is C. In this example C is the single 
node 3 with \[ = and C" is the 2-cycle of nodes 1 and 2 with A" = 1. 

Example 5. A 2-cycle and a chain, Figure 

The graph in Figure [|e combines the graphs of Figures [|a and c. Following the analysis of 
those two examples it is evident that for large t, y\ ~ t°, j/2 ~ i 1 , 2/3 ~ 2/4 ~ e *, 2/5 ~ 
e*. Because the populations of the 2-cycle are growing exponentially they will eventually 
completely overshadow the populations of the chain which are growing only as powers of t. 
Therefore the attractor will be X = (0, 0, 0, 1, l)/2 which, it can be checked, is a PFE of the 
graph (it is an eigenvector with eigenvalue 1). 

In general when a graph consists of one or more ACSs and other nodes that are not part of 
any ACS, the populations of the ACS nodes grow exponentially while the populations of the 
latter nodes grow at best as powers of t. Hence ACSs always outperform non-ACS structures 
in the population dynamics (see also Example 2). This is a consequence of the infinite walks 
provided by the positive feedback inherent in the ACS structure, while non-ACS structures 
have no feedbacks and only finite walks. 

Example 6. A 2-cycle and another irreducible graph disconnected from it, Figure ||f: 
One can ask, when there is more than one ACS in the graph, which is the dominant ACS? 
Figure |]f shows a graph containing two ACSs. The 2-cycle subgraph has a Perron-Frobenius 
eigenvalue 1, while the other irreducible subgraph has a Perron-Frobenius eigenvalue 
The unique PFE of the entire graph is e = (0, 0, 1, y2, l)/(2 + y/2) with eigenvalue y2- 
The population dynamics equations are y\ = y 2 ,y 2 = 2/1,2/3 = 2/4, 2/4 = 2/3 + 2/5, 2/5 = 2/4- 
The first two equations are completely decoupled from the last three and the solutions for y-y 
and J/2 are the same as for Example 3. For the other irreducible graph the solution is (since 

2/4 = 2/3 + 2/5 = 22/4) 

y A {t) = Ae^ 1 + Be-^\ y 3 (t) = -^{Ae^ + Be~^) + C, 

y h (t) = ^={Ae^ t + Be-^ t )-C. 
v2 

Thus, the populations of nodes 3,4 and 5 also grow exponentially but at a faster rate, reflecting 
the higher Perron-Frobenius eigenvalue of the subgraph comprising those nodes. Therefore 
this structure eventually overshadows the 2-cycle, and the attractor is X = e. The dominant 
ACS in this case is the irreducible subgraph formed by nodes 3,4 and 5. 

More generally, when a graph consists of several disconnected ACSs with different indi- 
vidual Ai, only the ACSs whose Ai is the largest (and equal to Ai(C)) end up with non-zero 
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relative populations in the attractor. 

Example 7. A 2-cycle downstream from another 2-cycle, Figure |]g: 

What happens when the graph contains two ACSs whose individual Ai equals Ai (C), and one 
of those ACSs is downstream of another? In Figure ||g nodes 3 and 4 form a 2-cycle which is 
downstream from another 2-cycle comprising nodes 1 and 2. The unique PFE of this graph, 
with Ai = 1, is e = (0, 0, 1, l)/2. The population dynamics equations are y\ = 1/2, y'2 — 
2/i , 2/3 = 2/4 + 2/2, 2/4 = 2/3- Their general solution is: 

y 1 (t) = Ae t + Be- t , y 2 (t) = Ae* - Be'* , 
Vs® = \(Ae l ~ Be- 1 ) + Ce l + De~\ 

2/4 (t) = \{Ae* + Be-*) + (C - |)e 4 + (| - Dje"*. 

It is clear that for large t, j/i ~ e*,j/2 ~ e ',2/3 ~ * e *j J/4 ~ i e *- While all four grow 
exponentially with the same rate Ai, as t — > 00 y 3 and jy 4 will overshadow yx an d 2/2- The 
attractor will be therefore be X = (0, 0, 1, l)/2 = e. Here the dominant ACS is the 2-cycle 
of nodes 3 and 4. This result generalizes to other kinds of ACSs: if one irreducible subgraph 
is downstream of another with the same Perron-Frobenius eigenvalue, the latter will have zero 
relative population in the attractor. 

The above examples displayed graphs with a unique PFE, and illustrated Proposition 5 
(iv). The stability of the global attractor follows from the fact that the constants A, B, C, D, 
etc., in the above examples, which can be traded for the initial conditions of the populations, 
appear nowhere in the attractor configuration X. Now we consider examples where the PFE 
is not unique. 

Example 8. Graph with Ai = and three disconnected components, Figure ^a: 
As mentioned in section 2 this graph has three independent PFEs, displayed in Figure ||a. The 
attractor is X = e3. This is an immediate generalization of Example 1 above. Using the same 
argument as for Example 1, we can see that y^ ~ t k if the longest path ending at node i is of 
length k. Therefore the attractor will have nonzero components only for nodes at the ends of 
the longest paths. Thus the populations of nodes 1,2,3 and 5 are constant, those of 4 and 6 
increase ~ t for large t, and of 7 as ~ t 2 , explaining the result. 

Example 9. Several connected components containing 2-cycles, Figure ||b: 
Here again there are three PFEs, one for each connected component. The population of nodes 
in 2-cycles which are not downstream of other 2-cycles (nodes 1,2,3,4,7 and 8) will grow as 
e*. As in Example 7, Figure |]g, the nodes of 2-cycles which are downstream of one 2-cycle 
(nodes 5,6,9 and 10) will grow as te*. It can be verified that the populations of nodes in 2- 
cycles downstream from two other 2-cycles (nodes 1 1 and 12) will grow as t 2 e f '. The pattern 
is clear: in the attractor only the 2-cycles at the ends of the longest chains of 2-cycles will 
have non-zero relative populations, explaining the result. 
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Example 10. Figure [T]a: 

From previous examples it is evident how the populations will change with time for Figure 
|l]a. Here we list the result: 

2/8 2/9 yio^t 2 , 

2/1, 2/2, 2/3, 2/4, 2/5, 2/n, 2/12, 2/13,2/20 ~ e*, 
2/6,2/7, 2/14, 2/15,2/16,2/17, 2/is, 2/19 ~ te*. 

Thus, starting from a generic initial population, only the eight nodes, 6,7,14-19, will be pop- 
ulated in the attractor. This explains the comments just after the statment of Proposition 5. 

Note the structure of the dominant ACS in the above examples when Ai > 0. If there is 
a unique PFE in the graph, the dominant ACS is the subgraph of the PFE. If there are several 
PFEs only a subset of those may be counted as illustrated in Examples 9 and 10, Figures ||b 
and |l]a, respectively. A general construction of the dominant ACS for an arbitrary graph will 
be described elsewhere. 



How long does it take to reach the attractor? 

The timescale over which the system reaches its attractor depends on the structure of the graph 
C. For instance in Example 2, the attractor is approached as the population of node 1, y±, 
overwhelms the population 1/2 ■ Since y\ grows exponentially as e*, the attractor is reached on 
a timescale A^ 1 = 1. (In general, when we say that "the timescale for the system to reach the 
attractor is t", we mean that for t » r, x(i) is "exponentially close" to its final destination 
X = limt-Kx, x(i), i.e. for all i, \xi{t) — Xi\ ~ e~'/ T < Q , with some finite a.) In contrast, in 
Example 1, the attractor is approached as j/3 overwhelms y\ and 2/2 ■ Because in this case all 
the populations are growing as powers of t, the timescale for reaching the attractor is infinite. 
When the populations of different nodes are growing at different rates, this timescale depends 
on the difference in growth rate between the fastest growing population and the next fastest 
growing population. 

For graphs which have no basic subgraphs, i.e., graphs with Ai = like those in Example 
1 and 8, all populations grow as powers of t, hence the timescale for reaching the attractor is 
infinite. 

For graphs which have one or more basic subgraphs (i.e., Ai > 1) but all the basic sub- 
graphs are in different connected components, such as Examples 2-6, the timescale for reach- 
ing the attractor is given by (Ai — ReA2) ~ , where A2 is the eigenvalue of C with the next 
largest real part, compared to Ai. 

For graphs having one or more basic subgraphs with at least one basic subgraph down- 
stream from another basic subgraph, the ratio of the fastest growing population to the next 
fastest growing will always be a power of t (as in Examples 7, 9 and 10) therefore the timescale 
for reaching the attractor is again infinite. 

Core and periphery of a graph 

Since the dominant ACS is given by a PFE, we will define the core of the dominant ACS to be 
the core of the corresponding PFE. If the PFE is simple, the core of the dominant ACS consists 
of just one basic subgraph. If the PFE is non-simple the core of the dominant ACS will be 



16 



a union of some basic subgraphs. Further, the dominant ACS is uniquely determined by the 
graph. This motivates the definition of the core and periphery of a graph: The core of a graph 
C, denoted Q(C), is the core of the dominant ACS of C. The periphery of C is the periphery 
of the dominant ACS of C. This definition applies when Ai (C) > 0. When Ai (C) = 0, the 
graph has no ACS and by definition Q(C) — In all cases X 1 (Q(C)) — Ai(C). For all the 
graphs depicted in this paper, except the one in Figure |l^, the red nodes constitute the core of 
the graph, the blue nodes its periphery, and the white nodes are neither core nor periphery - 
they are nodes that are not in any of the PFE subgraphs. 1 



Core overlap of two graphs 

Given any two graphs C and C whose nodes are labeled, the core overlap between them, 
denoted Ov{C, C), is the number of common links in the cores of C and C, i.e., the number 
of ordered pairs (J, i) for which Qy and Q'^ are both non-zero |^]. If either of C or C' does 
not have a core, Ov(C, C) is identically zero. 



Keystone nodes 

In ecology certain species are referred to as keystone species - those whose extinction or 
removal would seriously disturb the balance of the ecosystem j j24] , p5[ p6 , 27 1. One might 



similarly ask for the notion of a keystone node in a directed graph that captures some important 
organizational role played by a node. Consider the impact of the hypothetical removal of any 
node i from a graph C. One can, for example, ask for the core of the graph C — i that would 
result if node i (along with all its links) were removed from C. We will refer to a node i as 
a keystone node if C has a non-vanishing core and Ov(C, C — i) = [Q. Thus a keystone 
node is one whose removal modifies the organizational structure of the graph (as represented 
by its core) drastically. In each of Figures Qa-d, for example, the core is the entire graph. In 
Figure all the nodes are keystone, since the removal of any one of them would leave the 
graph without an ACS (and hence without a core). In general when the core of a graph is a 
single n-cycle, for any n, all the core nodes are keystone. In Figure [|b, nodes 3, 4 and 5 are 
keystone but the other nodes are not, and in Figure ^c only nodes 4 and 5 are keystone. In 
Figure |]d, there are no keystone nodes. These examples show that the more internal pathways 
a core has (generally, this implies a higher value of Ai), the less likely it is to have keystone 
species, and hence the more robust its structure is to removal of nodes. 

Figure |7| illustrates another type of graph structure which has a keystone node. The graph 
in Figure []a consists of a 2-cycle (nodes 4 and 5) downstream from an irreducible subgraph 
consisting of nodes 1,2 and 3. The core of this graph is the latter irreducible subgraph. Figure 
0b shows the graph that results if node 3 is removed with all its links. This consists of one 
2-cycle downstream from another. Though both 2-cycles are basic subgraphs of the graph, 
as discussed in Example 7, Figure ||g, this graph has a unique (upto constant multiples) PFE, 
whose subgraph consists of the downstream cycle (nodes 4 and 5) only. Thus the 2-cycle 4-5 
is the core of the graph in Figure [7]b. Clearly Ov{C, C — 3) = therefore node 3 in Figure ^a 
is a keystone node. 

We remark that the above purely graph theoretic definition of a keystone node turns out 

'The definition of the core of a graph given in refs. ^\ [2^] is a special case of this definition, holding only for 
graphs where each connected component of the dominant ACS has no more than one basic subgraph. 
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to be useful in the dynamical system discussed in this and the following sections. For other 
dynamical systems, other definitions of keystone might be more useful. 

0.4 Graph dynamics 

So far we have discussed the algebraic properties of a fixed graph, and the attractors of a 
particular dynamical system on arbitrary, but fixed graphs. However one of the most interest- 
ing properties of complex systems is that the graph of interactions among their components 
evolves with time, resulting in many interesting adaptive phenomena. We now turn to such an 
example, where the graph itself is a dynamical variable, and display how phenomena such as 
self-organization, catastrophes, innovation, etc, can arise. We shall see that the above discus- 
sion of (static) graph theory will be crucial in understanding these phenomena. 

We consider a process which alters a graph in discrete steps. The series of graphs produced 
by such a process can be denoted C n , n = 1, 2, .... A graph update event will be one step of 
the process, taking a graph from C„_i to C„. In fact the process we consider is a specific ex- 
ample of a Markov process on the space of graphs. At time n — 1, the graph C n _i determines 
the transition probability to all other graphs. The stochastic process picks the new graph C n 
using this probability distribution and the trajectory moves forward in graph space. In the ex- 
ample we consider, the transition probability is not specified explicitly. It arises implicitly as a 
consequence of the dynamics ([]]) that takes place on a fast time scale for the fixed graph C n _i. 

The graph dynamics is implemented as follows [p0|: 
Initially the graph is random: for every ordered pair with i ^ j, Cij is independently 
chosen to be unity with a probability p and zero with a probability 1 — p. cn is set to zero for 
all i. Each Xi is chosen randomly in [0, 1] and all Xi are rescaled so that J2i=i x i = 1- 

Step 1. With C fixed, x is evolved according to ([[]) until it converges to a fixed point, 
denoted X. The set C of nodes with the least Xi is determined, i.e, L = {i E S\Xi — 
mm jeS Xj}. 

Step 2. A node, say node k, is picked randomly from C and is removed from the graph 
along with all its links. 

Step 3. A new node (also denoted k) is added to the graph. Links to and from k to other 
nodes are assigned randomly according to the same rule, i.e, for every i ^ k Cik and Cki are 
independently reassigned to unity with probability p and zero with probability 1 — p, irre- 
spective of their earlier values, and Ckk is set to zero. All other matrix elements of C remain 
unchanged, xt is set to a small constant xq, all other Xi are perturbed by a small amount from 
their existing value Xi, and all Xi are rescaled so that J2t=i x i = 1- 

This process, from step 1 onwards, is iterated many times. 

Notice that the population dynamics and the graph dynamics are coupled: the evolution 
of the Xi depends on the graph C in step 1, and the evolution of C in turn depends on the 
Xi through the choice of which node to remove in step 2. There are two timescales in the 
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dynamics, a short timescale over which the graph is fixed while the Xi evolve, and a longer 
timescale over which the graph is changed. 

This dynamics is motivated by the origin of life problem, in particular the puzzle of how 
a complex chemical organization might have emerged from an initial 'random soup' of chem- 
icals, as discussed in section 1. Let us consider a pond on the prebiotic earth containing s 
molecular species which interact catalytically as discussed in the previous section, and let us 
allow the chemical organization to evolve with time due to various natural process which re- 
move species from the pond and bring new species into the pond. Thus over short timescales 
we let the populations of the species evolve according to ([[]). Over longer timescales we 
imagine the prebiotic pond to be subject to periodic perturbations from storms, tides or floods. 
These perturbations remove existing species from the pond and introduce new species into 
it. The species most likely to be completely removed from the pond are those that have the 
least number of molecules. The new species could have entirely different catalytic properties 
from those removed or those existing in the pond. The above rules make the idealization that 
the perturbation eliminates exactly one existing species (that has the least relative population) 
and brings in one new species. The behaviour of the system does not depend crucially on this 
assumption [fffi]. 

While in previous sections we have considered graphs with 1 -cycles, the requirement cu = 
in the present section forbids 1 -cycles in the graph. The motivation is the following: 1- 
cycles represent self-replicating species (see previous section, Example 2). Such species, e.g., 
RNA molecules, are difficult to produce and maintain in a prebiotic scenario and it is generally 
believed that it requires a complex self supporting molecular organization to be in place before 
an RNA world, for example, can take off p9[]. Thus, we wish to address the question: can 
we get complex molecular organizations without putting in self-replicating species by hand in 
the model? As we shall see below, this does indeed happen, since even though self -replicating 
individual species are disallowed, collectively self-replicating autocatalytic sets can still arise 
by chance on a certain time scale, and when they do, they trigger a wave of self-organization 
in the system. 

The rules for changing the graph implement selection and novelty, two important features 
of natural evolution. Selection is implemented by removing the species which is 'performing 
the worst', with 'performance' in this case being equated to a species' relative population (step 
2). Adding a new species introduces novelty into the system. Note that although the actual 
connections of a new node with other nodes are created randomly, the new node has the same 
average connectivity as the initial set of nodes. Thus the new species is not biased in any way 
towards increasing the complexity of the chemical organization. Step 2 and step 3 represent 
the interaction of the system with the external environment. The third feature of the model 
is dynamics of the system that depends upon the interaction among its components (step 1). 
The phenomena to be described in the following sections are all consequences of the interplay 
between these three elements - selection, novelty and an internal dynamics. 

0.5 Self Organization 

We now discuss the results of graph evolution. Figure || shows the total number of links in the 
graph versus time (n, the number of graph updates). Three runs of the model described in the 
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previous section, each with s = 100 and different values of p are exhibited. Also exhibited 
is a run where there was no selection (in which step 2 is modified: instead of picking one 
of the nodes of C, any one of the s nodes is picked randomly and removed from the graph 
along with all its links. The rest of the procedure remains the same). Figure^ shows the time 
evolution of two more quantities for the same three runs with selection displayed in Figure 
H The quantities plotted are the number of nodes with Xi > 0, s%, and the Perron-Frobenius 
eigenvalue of the graph, Ai. The values of the parameters p and s for the displayed runs 
were chosen to lie in the regime ps < 1. Much of the analytical work described below, 
such as estimation of various timescales, assumes thatps << 1. Figure [To| shows snapshots 
of the graph at various times in the run shown in Figure ^|b, which has p = 0.0025. It is 
clear that without selection each graph update replaces a randomly chosen node with another 
which has on average the same connectivity. Therefore the graph remains random like the 
starting graph and the number of links fluctuates about its random graph value w ps 2 . As 
soon as selection is turned on the behaviour becomes more interesting. Three regimes can be 
observed. First, the 'random phase' where the number of links fluctuates around ps 2 and si 
is small. Second, the 'growth phase' where I and s% show a clear rising tendency. Finally, the 
'organized phase' where I again hovers (with large fluctuations) about a value much higher 
than the initial random graph value, and si fluctuates (again with large fluctuations) about its 
maximum value s. The time spent in each phase clearly depends on p, and we find it also 
depends on s. This behaviour can be understood by taking a look at the structure of the graph 
in each of these phases, especially the ACS structure, and using the results of sections 2 and 
3. 



0.5.1 The random phase 

Initially, the random graph contains no cycles, and hence no ACSs, and its Perron-Frobenius 
eigenvalue is Ai = 0. We have seen in section 3 that for such a graph the attractor will have 
nonzero components for all nodes which are at the ends of the longest paths of nodes, and zero 
for every other node. (In Figure |To| a, there are two paths of length 4, which are the longest 
paths in the graph. Both end at node 13, which is therefore the only populated node in the 
attractor for this graph.) These nodes, then, are the only nodes protected from elimination 
during the graph update. However, these nodes have high relative populations because they 
are supported by other nodes, while the latter (supporting) nodes do not have high relative 
populations. Inevitably within a few graph updates a supporting node will be removed from 
the graph. When that happens a node which presently has nonzero Xj will no longer be at the 
end of the longest path and hence will get Xi — 0. For example node 34, which belongs to C, 
is expected to be picked for replacement within as O(s) graph update time steps. In fact it is 
replaced in the 8th time step. After that node 13 becomes a singleton and joins the set C. Thus 
no structure is stable when there is no ACS. Eventually, all nodes are removed and replaced, 
and the graph remains random. 

Note that the inital random graph is likely to contain no cycles when p is small {ps << 1). 
If larger values of p are chosen, it becomes more likely that the initial graph will contain a 
cycle. If it does, there is no random phase; the system is then in the growth phase, discussed 
below, right from the initial time step. 
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0.5.2 The growth phase 

At some graph update an ACS is formed by pure chance. The probability of this happening can 
be closely approximated by the probability of a 2-cycle (the simplest ACS) forming by chance, 
which is p 2 s (= the probability that in the row and column corresponding to the replaced node 
in C, any matrix element and its transpose both turn out to be unity). Thus the average time 



of appearance of an ACS is l/p 2 s. In the run whose snapshots are displayed in Figure 10, a 
2-cycle between nodes 26 and 90 formed at n = 2854. This is a graph which consists of a 
2-cycle and several other chains and trees. For such a graph we have shown in Example 3 in 
section 3 that the attractor has non-zero Xj for nodes 26 and 90 and zero for all other nodes. 
The dominant ACS consists of nodes 26 and 90. Therefore these nodes cannot be picked for 
removal at the graph update and hence a graph update cannot destroy the links that make the 
dominant ACS. The autocatalytic property is guaranteed to be preserved until the dominant 
ACS spans the whole graph. 

When a new node is added to the graph at a graph update, one of three things will happen: 

1. The new node will not have any links from the dominant ACS and will not form a 
new ACS. In this case the dominant ACS will remain unchanged, the new node will have 
zero relative population and will be part of the least fit set. For small p this is the most likely 
possibility. 

2. The new node gets an incoming link from the dominant ACS and hence becomes a part 
of it. In this case the dominant ACS grows to include the new node. For small p, this is less 
likely than the first possibility, but such events do happen and in fact are the ones responsible 
for the growth of complexity and structure in the graph. 

3. The new node forms another ACS. This new ACS competes with the existing dominant 
ACS. Whether it now becomes dominant, overshadowing the previous dominant ACS or it 
gets overshadowed, or both ACSs coexist depends on the Perron Frobenius eigenvalues of 
their respective subgraphs and whether (and which) ACS is downstream of the other. It can be 
shown that this is a rare event compared with possibilities 1 and 2. 

Typically the dominant ACS keeps growing by accreting new nodes, usually one at a time, 
until the entire graph is an ACS. At this point the growth phase stops and the organized phase 
begins. As a consequence it follows that Ai is a nondecreasing function ofn as long as s± < s 
&■ 



Time scale for growth of the dominant ACS. 

If we assume that possibility 3 above is rare enough to neglect, and that the dominant ACS 
grows by adding a single node at a time, we can estimate the time required for it to span the 
entire graph. Let the dominant ACS consist of si (n) nodes at time n. The probability that the 
new node gets an incoming link from the dominant ACS and hence joins it is ps\. Thus in An 
graph updates, the dominant ACS will grow, on average, by Asi = ps\ An nodes. Therefore 
si(n) = si(n a )exp((n — n a )/T g ), where r g = 1/p, n a is the time of arrival of the first 
ACS and si(n a ) is the size of the first ACS (=2 for the run shown in Figure |Io|). Thus si is 
expected to grow exponentially with a characteristic timescale r g = 1 /p. The time taken from 
the arrival of the ACS to its spanning is r g \n(s/si(n a )). This analytical result is confirmed 
by simulations (see Figure lib- 

In the displayed run, after the first ACS (a 2-cycle) is formed at n = 2854, it takes 1026 
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time steps, until n = 3880 for the dominant ACS to span the entire graph (Figure |To|e). This 
explains how an autocatalytic network structure and the positive feedback processes inherent 
in it can bootstrap themselves into existence from a small seed. The small seed, in turn, is 
more or less guaranteed to appear on a certain time scale (l/p 2 s in the present model) just by 
random processes. 



A measure of the 'structure' of the evolved graph. 

A fully autocatalytic graph is a highly improbable structure. Consider a graph of s nodes and 
let the probability of a positive link existing between any pair of nodes be p*. Such a graph 
has on average m* — p*(s — 1) incoming or outgoing positive links per node since links from 
a node to itself are disallowed. For the entire graph to be an ACS, each node must have at least 
one incoming link, i.e. each row of the matrix C must contain at least one positive element. 
Hence the probability, P, for the entire graph to be an ACS is 
P = probability that every row has at least one positive entry 
= [probability that a row has at least one positive entry] s 
= [1 — (probability that every entry of a row is zero)] s 
= [1 - (1 -p*) 8 " 1 ] 8 
= [1 - (1 - m*/(s - l)) 5 " 1 ]* 

Note from Figure || that at spanning the number of links is O(s). Thus the average degree 
m* at spanning is O(l). We have found this to be true in all the runs we have done where the 
initial average degree (at n = 1) was 0(1) or less. 

For large s and m* ~ 0(1), P w (1 — e~ m ) s ~ e~ as , where a is positive, and O(l). 
Thus a fully autocatalytic graph is exponentially unlikely to form if it were being assembled 
randomly. In the present model nodes are being added completely randomly but the underlying 
population dynamics and the selection imposed at each graph update result in the inevitable 
arrival of an ACS (in, on average, r a = l/p 2 s time steps) and its inevitable growth into a fully 
autocatalytic graph in (on average) an additional ~ r g In s time steps. 

It is a noteworthy feature of self-organization in the present model that an organization 
whose a priori probability to arise is exponentially small, ~ e _QS , arises inevitably in a rather 
short time, ~ i Ins (for large s). Why does that happen? First a small ACS of size si(n a ) ~ 
O(l) forms by pure chance. The probability of this happening is not exponentially small; it 
is in fact quite substantial. Once this has formed, it is a cooperative structure and is therefore 
stable. Its appearance ushers in an exponential growth of structure with a time scale r g = I /p. 
Hence a graph whose 'structuredness' (measured by the reciprocal of the probability of its 
arising by pure chance) = e as arises in only - In s steps. 

As mentioned in the introduction, one of the major puzzles in the origin of life is the 
emergence of very special chemical organizations in a relatively short time. We hope that 
the mechanism described above, or its analogue in a sufficiently realistic model, will help in 
addressing this puzzle. The relevance of this mechanism for the origin of life is discussed in 
ref. We remark that other models of self-organization (e.g. the well-stirred hypercycle) 
do not seem to be able to produce complex structured organizations from a simple starting 
network (see ref. [ 23 1). 

Another graph theoretic measure of the structure of the evolved graph is 'interdependency' 
among the nodes, discussed in JT^ , pT] |. Like the links and si, the interdependency is low in 
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the random phase, then rises in the growth phase to a value that is about an order of magnitude 
higher. 

0.5.3 The organized phase 

Once an ACS spans the entire graph the effective dynamics again changes although the mi- 
croscopic dynamical rules are unchanged. At spanning, for the first time since the formation 
of the initial ACS, a member of the dominant ACS will be picked for removal. This is because 
at spanning all nodes by definition belong to the dominant ACS and have non zero relative 
populations; one node nevertheless has to be picked for removal. Most of the time the re- 
moval of the node with the least Xi will result in minimal damage to the ACS. The rest of 
the ACS will remain with high populations, and the new node will keep getting repeatedly 
removed and replaced until it once again joins the ACS. Thus s% will fluctuate between s and 
s — 1 most of the time. However, once in a while, the node which is removed happens to be 
playing a crucial role in the graph structure despite its low population. Then its removal can 
trigger large changes in the structure and catastrophic drops in s% and I. Alternatively it can 
sometimes happen that the new node added can trigger a catastrophe because of the new graph 
structure it creates. The catastrophes and the mechanisms which cause them are the subject of 
the next section. 



0.6 Catastrophes and recoveries in the organized phase 

Figure [l2] shows the same run as that of Figure ^|b for n = 1 to n =50,000. In this long run 
one can see several sudden, large drops in s%: catastrophes in which a large fraction of the 
s species become extinct. Some of the drops seem to take the system back into the random 
phase, others are followed by recoveries in which si rises back towards its maximum value s. 
The recoveries are comparatively slower than the catastrophes, which in fact occur in a single 
time step. 

In order to understand what is happening during the catastrophes and subsequent recover- 
ies we begin by examining the possible changes that an addition or a deletion of a node can 
make to the core of the dominant ACS. 

Deletion of a node 

We have already seen how the deletion of a node can change the core - recall the discussion of 
keystone nodes in section 3: the removal of a keystone node results in a zero overlap between 
the cores of the dominant ACS before and after the removal. A zero core overlap means 
that a single graph update event (in which one of the least populated species is replaced by a 
randomly connected one) has caused a major reorganization of the dominant ACS: the cores 
of the dominant ACS before and after the event (if an ACS still exists) have not even a single 
link in common. We will call such events core-shifts. 

In an actual run a keystone node can only be removed if it happens to be one of the nodes 
with the least Xj. However the core nodes are often 'protected' by having higher Xi. Why is 
that? 
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X is an eigenvector of C with eigenvalue Ai. Therefore, when Ai ^ it follows that for 
nodes of the dominant ACS, Xi = (l/M)J2j c ijXj- If node i of the dominant ACS has only 
one incoming link (from the node j, say) then = Xj/Ai;we can say that Xi is 'attenuated' 
with respect to Xj by a factor Ai. The periphery of an ACS is a tree like structure emanating 
from the core, and for small p most periphery nodes have a single incoming link. For instance 
the graph in Figure p^ :, whose Ai = 1.31, has a chain of nodes 44 — > 45 — + 24 — * 29 — > 
52 — * 89 — * 86 — > 54 — + 78. The farther down such a chain a periphery node is, the lower 
is its Xi because of the cumulative attenuation. For such an ACS with Ai > 1 the 'leaves' of 
the periphery tree (such as node 78) will typically be the species with least Xi while the core 
nodes will have larger Xi. 

However, when Ai = 1 there is no attenuation. Recall that Proposition l(iii) shows that at 
Ai = 1 the core must be a cycle or a set of disjoint cycles, hence each core node has only one 
incoming link within the dominant ACS. All core nodes have the same value of Xj. As one 
moves out towards the periphery Ai = 1 implies there is no attenuation, hence each node in 
the periphery that receives a single link from one of the core nodes will also have the same Xi. 
Some periphery nodes may have higher Xi if they have more than one incoming link from the 
core. Iterating this argument as one moves further outwards from the core, it is clear that at 
Ai = 1 the core is not protected and in fact will always belong to the set of least fit nodes if the 
dominant ACS spans the graph. We have already seen in section 3 that when Ai = 1 and the 
core is a single cycle every core node is a keystone node. Thus when Ai = 1 the organization 
is fragile and susceptible to core-shifts caused by the removal of a keystone node. 

Addition of node 

We now turn to the effects of the addition of a node to the dominant ACS. We will use the 
notation C' n = C n _i — k for the graph of s — 1 nodes just before the new node at time step 
n is brought in (and just after the least populated species k is removed from C„_i). Q' n will 
stand for the core of C' n . In the new attractor the new species k may go extinct, i.e., Xk may 
be zero, or it may survive, i.e., Xk is non-zero. If the new species goes extinct then it remains 
in the set of least fit nodes and clearly there is no change to the dominant ACS. So we will 
focus on events in which the new species survives in the new attractor. 

Innovations 

We define an innovation to be a new node for which Xk in the new attractor is nonzero, i.e. 
a new node which survives till the next graph update [|23|]. This may seem to be a very weak 
requirement, yet we will see that it has nontrivial consequences. A description of various 
types of innovations and their consequences, with examples, is given in |j30|] - Here we present 
a graph-theoretic classification of innovations (in terms of a hierarchy, see Figure |l3] ). 

The innovations which have the least impact on the populations of the species and the evo- 
lution of the graph on a short time scale (of a few graph updates) are ones which do not affect 
the core of the dominant ACS, if it exists. Such innovations are of three types (see boxes 1-3 
in Figure pj| ): 

1. Random phase innovations. These are innovations which occur in the random phase 
when no ACS exists in the graph, and they do not create any new ACSs. These innovations 
are typically short lived and have little short term or long term impact on the structure of the 
graph. 
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2. Incremental innovations. These are innovations which occur in the growth and organized 
phases, which add new nodes to the periphery of the dominant ACS without creating any new 
irreducible subgraph. In the short term they only affect the periphery and are responsible for 
the growth of the dominant ACS. In a longer term they can also affect the core as chains of 
nodes from the periphery join the core of the dominant ACS. 

3. Dormant innovations. These are innovations which occur in the growth and organized 
phases, which create new irreducible subgraphs in the periphery of the dominant ACS. These 
innovations also affect only the periphery in the short term. But they have the potential to 
cause core-shifts later if the right conditions occur (discussed in the next subsection). 

Innovations which do immediately affect the core of the existing dominant ACS are al- 
ways ones which create a new irreducible subgraph. They are also of three types (see boxes 
4-6 in Figure p|): 

4. Core enhancing innovations. These innovations result in the expansion of the existing 
core by the addition of new links and nodes from the periphery or outside the dominant ACS. 
They result in an increase of Ai of the graph. 

5. Core-shifting innovations. These are innovations which cause an immediate core-shift 
often accompanied by the extinction of a large number of species. 

6. Creation of the first ACS. This is an innovation which creates an ACS for the first time 
in a graph which till then had no ACSs. The innovation moves the system from the random 
phase to the growth phase, triggering the self organization of the system around the newly 
created ACS. 



Innovations of types 4, 5 and 6 which affect the core of the dominant ACS will be called 
core-transforming innovations. These innovations cause a substantial change in the vector 
of relative populations in a single graph update. Innovations of type 5 and 6 also make a 
qualitative change in the structure of the graph and significantly influence subsequent graph 
evolution. The following theorem makes precise the conditions under which a core transform- 
ing innovation can occur. 

Core transforming Theorem 

Let N (or N n at time step n) denote the maximal new irreducible subgraph which includes 
the new species. One can show that N n will become the new core of the graph, replacing the 
old core Q n _i, whenever either of the following conditions are true: 

(a) Ax(iV n )>Ai(Q;)or, 

(b) Ai (N n ) — Ai (Q' n ) and N n is 'downstream' of Q' n (i.e., there is a path from Q' n to N n but 
not from N n to Q' n .) 

Such an innovation will fall into category 4 above if Q n -i C N n . However, if Q n -i and 
N n are disjoint, we get a core-shift and the innovation is of type 5 if Q n -i is non-empty and 
type 6 otherwise. 



0. 6 Catastrophes and recoveries in the organized phase 
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0.6.1 Catastrophes, core-shifts and a classification of proximate causes 

The large sudden drops visible in Figure [j~2| are now discussed. Our first task is to see if the 
large drops are correlated to specific changes in the structure of the graph. Let us focus on 
those events in which more than 50% of the species go extinct. There were 701 such events 



out of 1.55 million graph updates in a set of runs with s = 100, p = 0.0025. Figure 14 shows 
a histogram of core overlaps Ov(C n -i, C n ) for these 701 events. 612 of these have zero core 
overlap, i.e., they are core-shifts. If we now look at only those events in which more than 90% 
of the species went extinct then we find 235 such events in the same runs, out of which 226 are 
core-shifts. Clearly most of the large extinction events happen when there is a drastic change 
in the structure of the dominant ACS - a core-shift. 



Classification of core-shifts 

Using the insights from the above discussion of the effects of deletion or addition of a node, 
we can classify the different mechanisms which cause core-shifts. Figure |l5| differentiates 
between the 612 core-shifts we observed amongst the 701 crashes. They fall into three cate- 
gories [|23"1|: (i) complete crashes (136 events), (ii) takeovers by core-transforming innovations 
(241 events), and (iii) takeovers by dormant innovations (235 events). 

Complete crashes 

A complete crash is an event in which an ACS exists before but not after the graph update. 
Such an event takes the system into the random phase. A complete crash occurs when a key- 
stone node is removed from the graph. For example at n = 8232 the graph had Ai = 1 and its 
core was the simple 3-cycle of nodes 20, 50 and 54. As we have seen above, when the core is 
a single cycle every core node is a keystone node and is also in the set of least fit nodes. At 
n = 8233 node 54 was removed thus disrupting the 3-cycle. The resulting graph had no ACS 
and Ai dropped to zero. As we have discussed earlier, graphs with Ai = 1 are the ones which 
are most susceptible to complete crashes. This can be seen in Figure K every complete crash 
occurred from a graph with Ai(C n _i) = 1. 



Takeovers by core-transforming innovations 

An example of a takeover by a core-transforming innovation is given in Figures |To|g,h. At 
n = 6061 the core was a single loop comprising nodes 36 and 74. Node 60 was replaced by a 
new species at n — 6062 creating a cycle comprising nodes 60, 21, 41, 19 and 73, downstream 
from the old core. The graph at n = 6062 has one cycle feeding into a second cycle that is 
downstream from it. We have already seen in section 3 (see the discussion of Example 4) that 
for such a graph only the downstream cycle is populated and the upstream cycle and all nodes 
dependent on it go extinct. Thus the new cycle becomes the new core and the old core goes 
extinct resulting in a core-shift. This is an example of condition (b) for a core-transforming 
innovation. For all such events in Figure 15, \i(Q' n ) = Ai(C n _i) since k happened not to 
be a core node of C n -i- Thus these core-shifts satisfy Ai(C„) = Ai(N n ) > Xi(Q' n ) = 
Ai (C n _i) > 1 in Figure 0. 



Takeovers by dormant innovations 

We have earlier discussed dormant innovations, which create an irreducible structure in the 
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periphery of the dominant ACS which does not affect its core at that time. For example the 
2-cycle comprising nodes 36 and 74 formed at n = 4696. At a later time such a dormant 
innovation can result in a core-shift if the old core gets sufficiently weakened. 

In this case the core has become weakened by n — 5041, when it has Ai = 1.24. The 
structure of the graph at this time is very similar to the graph in Figure [7]a. Just as node 3 in 
Figure ^a was a keystone node, here nodes 44, 85, and 98 are keystone nodes because remov- 
ing any of them results in a graph like Figure [7]b, consisting of two 2-cycles, one downstream 
from the other. 

Indeed at n = 5041, node 85 is hit and the resulting graph at n = 5042 has a cycle (26 
and 90) feeding into another cycle (36 and 74). Thus at n = 5042 nodes 36 and 74 form the 
new core with only one other downstream node, 11, being populated. All other nodes become 
depopulated resulting in a drop in si by 97. A dormant innovation can takeover as the new core 
only following a keystone extinction which weakens the old core. In such an event the new 
core necessarily has a lower (but nonzero) Ai than the old core, i.e., Ai (C„_i) > Ai (C„) > 1 
(see Figure |l5| ). 

Note that 85 is a keystone node, and the graph is susceptible to a core-shift because of the 
innovation which created the cycle 36-74 earlier. If the cycle between 36 and 74 were absent, 
85 would not be a keystone species by our definition, since its removal would still leave part 
of the core intact (nodes 26 and 90). 



0.6.2 Recoveries 

After a complete crash the system is back in the random phase. In O(s) graph updates each 
node is removed and replaced by a randomly connected node, resulting in a graph as random as 
the initial graph. Then the process starts again, with a new ACS being formed after an average 
of l/p 2 s time steps and then growing to span the entire graph after, on average, (1/p) ln(s/so) 
time steps, where sq is the size of the initial ACS that forms in this round (typically sq = 2). 

After other catastrophes, an ACS always survives. In that case the system is in the growth 
phase and immediately begins to recover, with s± growing exponentially on a timescale 1/p. 
Note that these recoveries happen because of innovations (mainly of type 2 and 4, and some 
of type 3). 



0.6.3 Correlation between graph theoretic nature of perturbation and 
its short and long term impact 

In previous sections we have analysed several examples of perturbations to the system. These 
can be broadly placed in two classes based on their effect on Si: 

(i) 'Constructive perturbations' : these include the birth of a new organization (an innovation of 
type 6), the attachment of a new node to the core (an innovation of type 4) and an attachment 
of a new node to the periphery of the dominant ACS (an innovation of type 2). 

(ii) 'Destructive perturbations': these include complete crashes and takeovers by dormant 
innovations (both caused by the loss of a keystone node), and takeovers by core-transforming 
innovations (innovations of type 5). Note that the word 'destructive' is used only in the sense 
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that several species go extinct on a short time scale (a single graph update in the present model) 
after such a perturbation. In fact, over a longer time scale (ranging from a few to several 
hundred graph updates in the run of Figure ^|b), the 'destructive' takeovers by innovations 
generally trigger a new round of 'constructive' events like incremental innovations (type 2) 
and core enhancing innovations (type 4). 

Note that the maximum upheaval is caused by those perturbations that introduce new ir- 
reducible structures in the graph (innovations of type 4, 5 and 6) or those that destroy the 
existing irreducible structure. For example the creation of the first ACS at n = 2854 triggered 
the growth phase, a complete change in the effective dynamics of the system. Other examples 
of large upheavals are core-shifts caused by a takeover by a core-transforming innovation at 
n — 6061, takeover by a dormant innovation at n = 5041, and a complete crash at n = 8233. 
In sections 2 and 3 we have mentioned that irreducibility is related to the existence of positive 
feedback and cooperation, and the 'magnitude' of the feedback is measured by Ai. While the 
present model is a highly simplified model of evolving networks, we expect that this qualita- 
tive feature, namely, the correlation between the dynamical impact of a perturbation and its 
'structural' character embodied in its effect on the 'level of feedback' in the underlying graph, 
will hold for several other complex systems. 



0.7 Concluding remarks 

In this article we have attempted to show that a certain class of dynamical systems, those in 
which graphs coevolve with other dynamical variables living on them (in our example, living 
on the nodes of the graph), possess rich dynamical behaviour which is analytically and com- 
putationally tractable. Even in the highly idealized model discussed here, this behaviour is 
reminiscent of what happens in real life — birth of organizational structure characterized by 
interdependence of components, cooperation of parts of the organization giving way to com- 
petition, robust organizations becoming fragile, crashes and recoveries, innovations causing 
growth as well as collapse, etc. 



From the point of view of the origin of life problem the main conclusions are: 

(i) The model shows the emergence of an organization where none exists: a small ACS 
emerges spontaneously by random processes and then triggers the self-organization of the 
system. 

(ii) A highly structured organization, whose timescale of forming by pure chance is expo- 
nentially large (as a function of the size of the system), forms in this model in a very short 
timescale that grows only logarithmically with the size of the system. In [21 1 we have specu- 
lated that this timescale may be ~ 100 million years for peptide based ACSs, which is in the 
same ballpark as the timescale on which life is believed to have originated on the prebiotic 
earth. 

We remark that this speculation is not necessarily in conflict with, and is possibly comple- 
mentary to, some other approaches to the origin of life: 

(i) Complex autocatalytic organizations of polypeptides could enter into symbiosis with the 
autocatalytic citric acid cycle proposed in [[31]]. The latter would help produce, among other 
things, amino acid monomers needed by the former; the former would provide catalysts for 
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the latter. 

(ii) It is conceivable that membranes (possibly lipid membranes, which have been argued to 
have their own catalytic dynamics [f32[|) could form in regions where autocatalytic sets of the 
kind discussed here existed, thereby surrounding complex molecular organization in an en- 
closure. These 'cells' may have contained different parts of the ACS, thereby endowing them 
with different fitnesses. Such an assembly could evolve. 

(iii) It is also conceivable that such molecular organizations formed an enabling environment 
for self replicating molecules such as those needed for an RNA world. 

Testing some of these possibilities is a task for future models and experiments. Furthermore, 
the mathematical ideas and mechanisms discussed here might be relevant for these other ap- 
proaches also. 

The present model has a number of simplifying features which depart from realism but en- 
hance analytical tractability. One is the linearity of the populations dynamics on a fixed graph. 
Equation ([j]) is nonlinear, but since it originates via a nonlinear change of variables from a 
linear equation, equation (||), its attractors can be easily analysed in terms of the underlying 
linear system. The attractors are always fixed points, and are just the Perron Frobenius eigen- 
vectors of the adjacency matrix of the graph. This allows us to use (static) graph theoretic 
results for analysis of the dynamics. 

In this context it is helpful to note that while the population dynamics in the present model 
is essentially linear as long as the graph is fixed, the model feeds the result of the population 
dynamics into the subsequent graph update (the least populated node is removed). Thus over 
long time scales over which the graph changes, the 'coupling constants' Cy in equation ([!]) are 
not constant but implicitly depend upon the x», thus making the evolution highly nonlinear. 
By virtue of the simplifying device of widely separated time scales for the graph dynamics 
and the population dynamics (the population variables reach their attractor before the graph 
is modified), what we have is piecewise linear population dynamics. It is essentially linear 
between two graph updates, and nonlinear over longer time scales because of the intertwining 
of population dynamics and graph dynamics. This nonlinearity is essential for all the complex 
phenomena described above, while the short time scale linearity is an aid in analysis. It would 
be interesting to explore complex phenomena in models in which the short term population 
dynamics is also inherently nonlinear. This naturally arises in prebiotic chemistry when the 
concentration of the reactants (which are assumed buffered here) are dynamical variables in 
addition to the catalysts and products, as well as in several other fields. 

The present model describes a well-stirred reactor; there are no spatial degrees of freedom. 
This precludes a discussion of the origin of spatial structure and its consequences alluded to in 
section 1 . It is worthwhile to extend the model in that direction. Another issue is the genera- 
tion of novelty. Here the links of the new node are drawn from a fixed probability distribution. 
In real systems this distribution depends upon the (history of) states of the system. A further 
direction for generalization consists in letting the two time scales of the population and graph 
dynamics, separated by hand in the present model, be endogenous. 
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e=(11000000000000000000)/2 
e 2 =(00000110000000000000)/2 
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Figure 1: a. A directed graph with 20 nodes, b. The adjacency matrix of the graph in Figure 
|l|a. c. A subgraph of the graph in Figure [lji. The adjacency matrix of the subgraph is the shaded 
portion of the matrix in Figure |l|b. d. Four Perron-Frobenius eigenvectors (PFEs) of the graph 
in Figure [l]a. The first three vectors have been divided by factors of 2, 2 and 6 respectively to 
normalize them. e. The irreducible decomposition of the graph in Figure |l|a into subgraphs C a , 
with a = 1, 2, . . . , 14. Each of the 14 nodes of the graph in Figure |lp represents either an 
irreducible subgraph of the graph in Figure [l]a, or a single node that is not part of any irreducible 
graph. The basic subgraphs of the graph in Figure |l|a are represented by yellow nodes. The 
dotted lines in Figure [l]b demarcate the adjacency matrices corresponding to the subgraphs C a . 
Colours identify the attractor of the dynamics discussed in section 3, except in Figure [l^. In 
all graphs in the article (except Figure [l]s), white nodes have zero relative population in the 
attractor, Xi = 0, while blue and red nodes have Xi > 0. In graphs that have an autocatalytic 
set, red nodes belong to the core of the dominant autocatalytic set of the graph, blue nodes to its 
periphery, and white nodes are outside the dominant autocatalytic set. 
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a) b) c) d) e) 




Figure 2: Various autocatalytic sets (ACSs). a. A 1-cycle, the simplest ACS. b. A 2-cycle. c. 
An ACS which is not an irreducible graph. d,e Examples of ACSs which are irreducible graphs 
but not cycles. 
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Figure 3: Example showing that the Ai of a PFE subgraph equals the Ai of the whole graph, a. 
A directed graph with 6 nodes, b. x is an eigenvector of its adjacency matrix C with eigenvalue 
Ai = 1, which is the Perron-Frobenius eigenvalue of the graph. The non zero components of 
x and the corresponding rows and columns of C are highlighted, c. The subgraph of the PFE 
x. d. The vector x' constructed by removing the zero components of x is an eigenvector of the 
adjacency matrix, C", of the PFE subgraph. Its corresponding eigenvalue is unity, which is also 
the Perron-Frobenius eigenvalue of the PFE subgraph. 
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A,,= 1.52 



Figure 4: Ai is a measure of the multiplicity of internal pathways in the core of simple PFE. 
Four irreducible graphs are shown. An irreducible graph always has a unique PFE that is simple 
and whose core is the entire graph. The Perron-Frobenius theorem ensures that adding a link to 
the core of a simple PFE necessarily increases its Perron-Frobenius eigenvalue Ai. The figure 
also illustrates the concept of keystone nodes (see section 3). 
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Figure 5: Examples of graphs with a unique PFE. The subgraph of the PFE coincides with the 
nodes that are populated in the attractor. 
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Figure 6: Examples of graphs with multiple PFEs. (a) ei,e2,e3 are all eigenvectors with 
eigenvalue Ai = 0. Only e3 is the attractor. Thus for generic initial conditions, only node 
7, which sits at the end point of the longest chain of nodes is populated in the attractor. (b) 
ei, e2, e3 are all eigenvectors with eigenvalue Ai = 1, but only e3 is the attractor. Only the 
2-cycle of nodes 1 1 and 12, which sits at the end of the longest chain of cycles, is populated in 
the attractor. 




Figure 7: Example illustrating the notion of keystone species and the phenomenon of a core- 
shift. Node number 3 is keystone node of the graph in part a because its removal produces the 
graph in b which has a zero core overlap with the graph in a. The core nodes of both graphs are 
coloured red. An event in which the core before the event and after the event have zero overlap 
is called a 'core-shift'. 
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Figure 8: The number of links versus time (n) for various runs. Each run had s = 100. The 
black curve is a run with selection turned off; a random node is picked for removal at each graph 
update. The other curves show runs with selection turned on and with different p values: Blue 
p = 0.001, Redp = 0.0025, Green p = 0.005. 



Bibliography 



39 




1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 




1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 



Figure 9: Number of populated nodes, si, (black curve) and the Perron-Frobenius eigenvalue 
of the graph, Ai, (red curve) versus time, n, for the same three runs shown in Figure [| Each 
run has s — 100 and p — 0.001, 0.0025 and 0.005 respectively. The Ai values shown are 100 
times the actual value. 
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(a) n=l (b) n=2854 (c) n=3880 
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Figure 10: Snapshots of the graph at various times for the run shown in Figure |9p with s — 100 
and p = 0.0025. See text for a description of the major events. In all graphs, white nodes are 
those with Xi = 0. All coloured nodes have Xi > 0. In graphs which have an ACS, the red 
nodes are core nodes and the blue nodes are periphery nodes. 
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Figure 11: Each data point shows the average of r 9 (the growth timescale for an ACS) over 5 
different runs with s = 100 and the given p value. The error bars correspond to one standard 
deviation. The solid line is the best linear fit to the data points on a log-log plot. Its slope is 
consistent with the analytically predicted slope -1 (see the discussion of the growth phase in 
section 5.) 
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Figure 12: The same ran displayed in Figure [jjj over a longer timescale, till n = 50000. This 
displays repeated rounds of crashes and recoveries. 
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Figure 13: A hierarchy of innovations. Each node in this binary tree represents a class of node 
addition events. Each class has a name; the small box contains the mathematical definition of the 
class. All classes of events except the leaves of the tree are subdivided into two exhaustive and 
mutually exclusive subclasses (represented by the two branches emanating downwards from the 
class). The number of events in each class pertain to the run of Figure ^b with a total of 9999 
graph updates, between n — 1 (the initial graph) and n = 10000. In that run, out of 9999 
node addition events, most (8929 events) are not innovations. The rest (1070 events), which 
are innovations, are classified according to their graph theoretic structure. The classification is 
general; it is valid for all runs. Xk is the relative population of the new node in the attractor 
configuration of ([!]) that is reached in step 1 of the dynamics (see Section 4) immediately fol- 
lowing the addition of that node. N stands for the new irreducible subgraph, if any, created 
by the new node. If the new node causes a new irreducible subgraph to be created, N is the 
maximal irreducible subgraph that includes the new node. If not, N = $ (where $ stands for 
the empty set). Qi is the core of the graph just before the addition of the node (just before step 3 
of the dynamics in Section 4) and Q / the core just after the addition of the node. The six leaves 
of the innovation subtree are numbered from 1 to 6 and correspond to the classes discussed in 
Section 6. The impact of each kind of innovation on the system dynamics is discussed in the 
text and in more detail in Some classes of events happen rarely (e.g., classes numbered 5 
and 6) but have a major impact on the dynamics of the system. The precise impact of all these 
classes of innovations on the system over a short time scale (before the next graph update) as 
well as their probable impact over the medium term (upto a few thousand graph updates) can be 
predicted from the graph theoretic structure of N and the rest of the graph at the moment these 
innovations appear in a run. 
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Figure 14: Large crashes are predominantly core-shifts. A histogram of core overlaps for the 
701 events where si dropped by more than s/2 observed in various runs with s = 100 and 
p — 0.0025, totalling 1.55 million iterations. 
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Figure 15: Classification of core-shifts into three categories. The graph shows the frequency, 
/, of the 612 core-shifts observed (see Figure ^) in a set of runs with s = 100 and p = 0.0025 
vs. the Ai values before, Ai(C n -i), and after, Ai(C n ), the core-shift. Complete crashes (black; 
Ai(Cn-i) = 1, Ai(Cn) = 0), takeovers by core-transforming innovations (blue; Ai(C n ) > 
Ai(Cn-i) > 1) and takeovers by dormant innovations (red; Ai(C n _i) > Ai(C n ) > 1) are 
distinguished. Numbers alongside vertical lines represent the corresponding / value. 



